Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #26 +/- ##
=======================================
Coverage 77.36% 77.36%
=======================================
Files 26 26
Lines 1162 1162
=======================================
Hits 899 899
Misses 223 223
Partials 40 40 ☔ View full report in Codecov by Sentry. |
1 task
mtodor
reviewed
Jan 19, 2026
Enhanced tool descriptions and parameter schemas to better guide LLMs on when to use optional parameters and which tools to select for different query types. Added mcp-testing-framework configuration with 8 test cases covering CVE queries and cluster operations, achieving 87.5% pass rate with GPT-5 models. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Tomasz Janiszewski <tomek@redhat.com> # Conflicts: # internal/toolsets/config/tools.go
Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>
Fix E2E test assertion failures by improving tool descriptions with
smart usage pattern guidance. Tool descriptions now clearly indicate:
- When to call all three CVE tools for comprehensive coverage
("Is CVE-X detected in my clusters?" without specific cluster name)
- When to call only specific tools for targeted queries
("Is CVE-X detected in cluster staging-central-cluster?")
Changes:
- Update vulnerability tool descriptions (clusters, deployments, nodes)
to use directive language and clear usage patterns
- Adjust cve-nonexistent test maxToolCalls from 2 to 3 to match
comprehensive check pattern
- Update cve-cluster-does-not-exist verification to accept both
"CVE not detected" and "cluster doesn't exist" responses
Results: All 24/24 E2E test assertions now pass (improved from 21/24).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…criptions Changes: - Switch E2E agent from GPT-4o to Claude Sonnet 4.5 via Vertex AI - Add enableAllTools: true to MCP config for auto-approval - Configure gpt-5-nano as LLM judge for cost efficiency - Improve CVE tool descriptions with clear WHEN TO USE/WHEN NOT TO USE sections - Update test assertions to account for Claude's comprehensive CVE checking behavior - Update run-tests.sh to export Vertex AI environment variables The tool descriptions now explicitly guide when to use each CVE detection tool: - General "clusters" queries → comprehensive check (all 3 tools) - Specific component queries → single relevant tool only - Single cluster queries → orchestrator tool with cluster filter All 8 E2E tests passing with 24/24 assertions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Update README.md with complete env var configuration - Fix jq command examples (path and property names) - Add AGENT_MODEL_NAME configuration to run-tests.sh - Clarify cluster ID-only requirement in tool descriptions - Add explanatory comments to eval.yaml about assertion fields - Improve list-clusters verification text - Remove leftover mcp-testing-framework.yaml file Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
mtodor
reviewed
Feb 2, 2026
Collaborator
mtodor
left a comment
There was a problem hiding this comment.
Looks good! I have added a few questions and thoughts. Nothing crucial.
I didn't review the tasks because we will replace them in a follow-up.
Co-authored-by: Mladen Todorovic <mtodor@gmail.com>
janisz
commented
Feb 2, 2026
- Upgrade from gevals v0.0.1 to mcpchecker v0.0.4 - Move e2e-tests Go module to tools/ subdirectory to fix module resolution issue when running MCP server from mcpchecker directory - Rename gevals/ directory to mcpchecker/ - Update build script: build-gevals.sh → build-mcpchecker.sh - Update all references in documentation and scripts - Fix jq commands in README for new mcpchecker JSON structure - Remove gevals dependency from root go.mod - Add Dependabot configuration to monitor both root and e2e-tests/tools modules All tests passing (8/8 tasks, 24/24 assertions). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add smoke test script that validates e2e test configuration without requiring actual agents or API keys. This allows CI to catch configuration errors early. Changes: - Add e2e-tests/scripts/smoke-test.sh to validate: - mcpchecker binary builds - MCP server compiles - YAML configuration files are valid - Task files exist and are parseable - Add .github/workflows/e2e-smoke-test.yml for CI integration - Update README with smoke test section The smoke test runs in <30s and requires no secrets, making it ideal for PR validation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
mtodor
approved these changes
Feb 3, 2026
Collaborator
mtodor
left a comment
There was a problem hiding this comment.
Nice work! 🏆
Added a few nitpicks, nothing crucial or something that we can do in a followup.
- Merge e2e-smoke-test.yml into test.yml to eliminate duplicate builds - Simplify smoke-test.sh to only build and verify mcpchecker binary - Remove MCP server build from smoke test (already built by test workflow) - Remove YAML validation from smoke test (will use yamllint in separate PR) - Add Makefile target for e2e-smoke-test - Add go mod tidy verification using find for all Go modules - Use find for dependency downloads to support multiple modules This addresses PR review feedback and reduces CI build time by avoiding duplicate checkout and build operations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Enhanced tool descriptions and parameter schemas to better guide LLMs on when to use optional parameters and which tools to select for different query types. Added mcp-testing-framework configuration with 8 test cases covering CVE queries and cluster operations, achieving 87.5% pass rate with GPT-5 models.
Validation